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SUMMARY 



This paper outlines some of the research activities underway as part of the Air Force s Learning 
Abilities Measurement Program (LAMP). The major goal of the project is to devise new models of 
the nature and organization of human abilities with the long-term goal of applying those models to 
improve current personnel selection and classification systems. As an approach to this ambitious 
undertaking, we have divided the activities of the project into two categories. The first category is 
concerned with identifying fundamental learning abilities by determining how learners differ in their 
abilities to think, remember, solve problems, and acquire knowledge and skills. From research already 
completed, we have established a four-source framework that assumes that observed learner 
differences are due to differences in processing speed; processing capacity; and the breadth, extent, and 
accessibility of conceptual knowledge and procedural and strategic skills. The second category of 
research activities is concerned with validating new models of learning abilities. To do this, we arc 
building a number of computerizec ' elligcnt tutoring systems that serve as mini-courses in technical 
areas such as computer programming and electronics troubleshooting. A major objective of this part 
of the program is to develop principles for prtxlucing indicators of student learning progress and 
achievement. These indicators will serve as the leaining outcome measures against N\hich newly 
developed learning abilities tests will be evaluated in future vaiidution studies. 
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I. INTRODUCTION 

Considerable headway has been made during the last decade in our understanding of human 
cognition. This has led to speculation that it is only a matter of time before an improved technology for 
gauging individuals' intellectual proficiencies will be developed. The stakes are high: Psychological 
testing of cognitive proficiency is presently widespread in industry, the schools, and the military. 
Improved tests would have a proiound economic impact in cutting education and training costs and 
enabling a more efficient and fair system of personnel utilization. Although the concept of 
psychological testing must certainly be considered one of psychology's tiue success stories, it is also 
primarily a past accomplishment. Systematic studies of predictive validity ha.c shown »hat today's 
aptitude tests are no better than those available shortly after World War 11 (Christal, 1981; Kyllonen, 
1986). 

But even if it is agreed that forces are conspiring to usher in a new ei a of cognitive testing, there 
still is considerable debate on exactly what torm these new cognitive tests will take. On one side of the 
debate, some argue that what cognitive psychology has to offer is a rationale and a methodology for 
measuring basic information processing components {Dellerman. 1986; Jensen, 1982; jc*osnei & 
McLeod, 1982). According to this view, the cognitive test battery of the future would consist of 
measures of speed of retrieval from long-term memory, short-term memory scanning rate, probability 
of transfer from short- to long-term storage, and the like. On the onposite end of the deb-^te are those 
who suggest that the fundamental insight of cognitive science is that cognitive skill reflects primarily 
knowledge rather than general processing capabilities. This perspective has led to calls for testing 

* 

intermingled with instruction, testing aimed at measuring what students know and what they have 
le:irned in the context of their current instructional experience (Embrelson, in press; Glaser, 1985). 
This has been called steerini^ testing (Lcsgold, Bonar, & Ivill, 1987) or apprenticeship testing (Collins, 
1986). Between these posvions are those who propose new kinds of cogn* ivc tests that are not 
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radically different from existing oncs^ JUt perhaps richer and more diverse in what they measure (Hunt, 
1982; Hunt & PeUcgrino, 1984; Sternberg, 1981b). 

In this paper, we provide a status report of one ongoing program of re«;earch, the Learning /abilities 
Measurement Program (LAMP), that has been concerned with developing new m» thods for measuring 
cognitive abilities. We discuss some of our early thinking on the implications of cognitive psychology 
for testing, and hew we 1 avc adjusted our ideas in light of data collected in our cognitive abilities 
measurement (CAM) laboratory. We conclude with a brief discussion of CLASS, the Complex 
Learning Assessment Laboratory, the setting in which we intend to validate the new tcsts.^ 

II. COGNITIVE THEORY AND APTITUDE TESTING 

The idea of grounding psychological testing in cognitive theory is not entirely novel. During the 
1970s and 1980s, the Air Force Office of Scientific Reseerch (AFOSR) and especially, the Office of 
Naval Research (ONR) supported a number of basic research projects which had the explanation of 
individual differences in learning and cognition as a central goal. This research largely concentrated on 
the analysis of conventional aptitude tests, probably for two reasons. First, analysis of aptitude tests is 
important in its own right, as an attempt to determine what it is that such tests measure. But, second, 
and perhaps more importantly, aptitude tests can be viewed as generic surrogates for tasks tapping 
more complex, slowly developing learning skills. It is difficult and extremely expensive to identify and 
analyze the information processing components associated with the acquisition of computer 
programming skill; so goes the argument: It is far cheaper and more efficient to anal^w the seemingly 
more tractable components of some aptitude test, such as an analogie.^^ test, that predicts success in 
computer programming. And the fact that tests do such a good job in predicting training outcomes r^n 
be taken as evidence that pretty much the same cognitive components are involved in both test-taking 
and learning. 



This paper does not review the research accomplished by William Tirrc and Linda Elliott 
concerning individual differences in text comprehension. Readers interested in this area are referred 
to T^rre and Elliott (1987). 
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The wave of aptitude research that was motivated by these considerations did not lead directly to 
improvements in existing aptitude testing systems, however. A number of new methods and 
techniques, such as cognitive correlates analysis (Hunt, Fros^ & Lunneborg, 1973) and componcntial 
analysis (Sternberg, 1977), were developed for analyzing aptitude tests, but the application of these 
methods did not suggest how the tests themselves might be improved. There have been suggestions 
that cognitive tasks exported from tnc experimental psychologist's laboratory might somehow be used 
to supplement or even replace existing aptitude tests (Carroll, 1981; Hunt, 1982; Hunt & Pellegrino, 
1984; Pellegrino & Glaser, 1979; Rose & Fernandez, 1977, Snow, 1979; Sternberg, 1981b), but after 
almost 10 years, the research sdll has not been carried out to an extent suffic'cnt for determining 
whether this is really feasible. 

Probably the reason cognitive-based aptitude research has not translated already into better tests is 
that this has not been a primary goal of the research. Indeed, if the creation of better tes^s had been 
the primary gcal, the approach of analyzing and decomposing existing tests does not seem very 
promising. If such research efforts were completely successful, "if the research turned out better than 
anvone's wildest expectations," at best, new tests would simply duplicate the validity of existing tests. 

III. LEARNING ABILITIES MFASUREMENT PROGRAM (LAMP) 

In contrast to some of the aptitude research projects previously discussed, our own work in 
connection with Project LAMP has from its inception been focused on the goal of developing an 
improved selection and classification system. Our current efforts fall into two categories. Firxt, we aie 
continuing to model basic cognitive learning skills and their interrelationships, and to explore different 
methods for measuring these skills. Second, we have more recently begun thinking seriously about a 
system for validating the new cognitive measures. The system involves the extraction of learning 
indices, both on short-term (1 hour) and long-term (1 week) learning tasks, that uill serve as criteria 
against which the new cognitive measures will be validated. Although we have not yet collected data on 
the long-term learning tasks, we have set up the laboratory, which consists of 30 computerised tutoring 



stations. In the rcinaindcr of this paper, we discuss these two categories of ongoing LAMP research. 
We begin with a discussion of studies that have attempted lo measure cognitive skiPs. 

Moileimg Cognitive Skills: The Four-Source Framework 

Much of our v/ork on identifying basic learning skills has centered around vshat we have called the 
four-source framework (Kvllonen, 1V»S^>). This is the idea that individual differences in a wide variety of 
learning and performance lasks are due to differences in four underlying sources: (a) effective cognitive 
processtnf^ speed', (b) effecli\e pmcessmg capaciiv; and the general breadth, accessibility, and pattern of 
one's (c) conceptual knowlcd}^* and (d) procedural and strategic jAi//a. Figure 1 illustrates these 
relationships. 

We refer to the knowledge and skill components of this mode! (components (c] and (dj) as enablcn, 
in the sense l*^at any learning or pcrlormancc task can be characteri/cci as consisting of a necessary set 
of knowledge and <kill prerequisites. We refer to the processing speed and working memory 
components of the mcxlel ([a| and fb|) as mcdtaton, m the sen.se thai these components niediatc the 
degree lo which the learner or problem -solver is able to use his or her knowledge and skills effectively. 
We ha\e found the lour-souicc framcAvoik lo be useful in organizing our own o,. wc!l as others' 
research ^.nd in mi)niioring our research pn>gress Further, although we ha\e not vet applied it widely 
inthi.s fashion, ue expect that the sv.siem will be useful tor task analysis purposes. 

Thus far, most of the research we have acci>mplished in cimnection with the four-source p. posal 
has Iven concerned with (a) improving "he way in which we measure cognitive skills and (b) 
determining the dimensionality of the skills and subskills embedded within the fi>ur-source mi>deL We 
now turn lo a discus>ion of the four components, m turn. 

Processing Speed 

Considerable research on individual differences in cognition over the past 10 years has been 
concerned with determining the re lai inns hip between processing speed and per. jrmance on complex 
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Fiffire L Four-Source Research Framework. Performance in each of the three learning phases 

(Knowledge Acquisition, Skill Acquisition, and Skill Automatization phases; right side of 
figure) is presumed to be a function of the enablers (Knowledge and Skills), the mediators 
(Processing Capacity and Processing Speed), and whether the prior learning phase is 
complete. 
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tasks, such as intelligence tests. There are a number of reasons for the high level of interest in 
processing speed. One is that we now can measure it. The availability of microcomputers as testing 
instruments makes it feasible to measure, with precision, response time to particular items. Paper-and- 
pencil tests allowed only gross estimates of response speed. Second, processing speed seems to reflect 
something basic, something fundamentally a part of all mental activity, and therefore something thai 
might explain the general factor in mielligcnce, in some sense. Third, since the beginnings of modern 
cognitive p.sychology, processing speed has played a major role in cognitive theories in revealing the 
dynamics of mental processes. Neisser's (1967) book, which is generally considered the kickoff point 
for the discipline, reported primarily on reaction time studies. Finally, there are operational 
performance contexts, such as the Air Traffic Controller Workstation or the cockpit, that require 
efficient processing of considerable data. Understanding the relationship between processing speed 
and performance in these contexts would have immediate practical payoff. 

In our own laboratory, we have conducted a number of studies on processing speed that have 
focused on both its psychometric properties and its relatio.^ship to performance on criterion tasks. 
Studies have run the gamut in addressing both applied and basic issues. A number of early studies in 
the project (reported in Kyllonen, 1985) were designed simply to address the question of whether 
processing speed could be more appropriately characterized as a unitary or multidimensional construct. 
Thai *s, we addressed the question of whether some people arc generally faster information processors 
than others, or whether it is more appropriate to think in terms of varieties of processing speed. Both 
positions can be argued for on rational grounds. Much of Jensen s work (Jensen, 1982) at least 
implicitly presumes a general speed factor. But low correlations between processing speed tasks and 
measures of gcnenJ inti^lltgence have led others to propose multiple, correlated processing speed 
components (e. g., Detterman, 1980). 

One way to address the dimensionality question is simply to measure response time on a wide 
variety of cognitive tests, such as those one finds in the Educational Testing Service (ETS) kit, and 
perform a factor analysis on the resulting scores. In one study (Kyllonen, Tirre, & Christai, 1985), we 



did just that and found evidence for both separate reasoning, quantitative, and Vv^bal processing 
factors, and a higher-order general processing speed factor. Interestingly, we found that altbough 
processing speed scores were quite reliable, at least within session, they were not related to accuracy 
«corc5 on the same tests. Timed versions of the tests thus mix these two separable components of 
performance in yielding only a single score. There are problems with this approach to testing the 
dimensionality question, such as how to allow for speed-accuracy trade-off, what to do with response 
times when the person guessed incorrectly, and so fortL But a more substantive problem is that 
although the flndings are suggestive, they fall considerably short of revealing much about the processes 
that produced them. 

Thus, in subsequent work we have restricted our focus (and employed a narrower range of tasks) in 
the hope of achieving a better process-oriented unders anding of the generality question. In these 
studies, we attempted to identify processing stages, then measure the duration of those stages for 
individual subjects, then compute the stage intpr-corrr lations. The procedure is best illustrated by 
example. In the Tirst study (Kyllonen, 1987), we administered a series of tasks that required subjects 
simply to determine whether two words presented (e.g., happy-lose) were similar or dissimilar with 
respect to valence. Happy would be considered a positive-valence word; lose would be considered a 
negative valence word. We presumed that a decision on this task was executed after a series of 
proceeding stages. The subject begins by encoding one of the words, then encoding the second word. 
The result of the encoding process is that a symbol representing valence is deposited in working 
memory for each word. The subject then compares those symbols. The result of the comparison 
process is an implicit assertion that the symbols are either the same or different. A decision process 
then takes the comparison result and tianslates it bto a plan for the execution of the motor response. 
A response process then executes the motor response. Through the method of pre cueing, which has 
been used with some success in separating process components on other reaction time tasks (e.g., 
Sternberg, 1977), wr. were able to independently estimate the duration of each of these processing 
stages. 



We also adtninistered two other versions of the task in which the only difference was that subjects 
were required to decide whether (a) two digits were the same with respect to oddness or evenness, or 
(b) two letters were the same with respect to vowelness or consonantness. The data analysis addressed 
two questions regarding generality. First, were parallel measures of stage duration (estimates derived 
from separate blocks of items) more highly inter-correlated than correlated with other stage du;ations? 
This is a direct test of stage bdependence. Second, were stage durations estimated from tasks with 
different content (words, digits, or letters) more highly inter-correlated or were alternative stages taken 
from same-content tasks more highly inter-correlated? This is a direct test of the relative importance 
of content and process. Although the analyses were rather complex, the general fmding was that 
processes were somewhat independent, and also general across contents. That is, fast encoders were 
not necessarily fast comparers, but fast encoders on the word task were also fast encoders on the digit 
task. 

One of the problems with this approach to studying dimensionality is that it relics on a model of 
performance that assumes serial execution of processing stages. In our more recent work (Kyllonen, 
Tirie, & Christal, 1988), we have relaxed this assumption by applying both those models that assume 
serial execution and those that do not in estimating stage durations. (We also have abandoned the pre- 
cueing technique btcause its validity depends on the serial execution assumption.) Following 
Donaldson s (1983) analysis, stage durations can be estimated in two ways. Assume an ordered set of 
tasks, each of which can be characterized as requiring a proper superset of the processes of its 
predecessor. For example, the following set of tasks, each of which requires processing a pair of words, 
might be characterized this way. reaction time, choice reaction time, physical matching, name 
matching, semantic (meaning) matching. That is, reaction time consists only of a reaction component; 
the choice task adds a decision component, the physical matching task adds comparison, name 
matching adds retrieval from long-term-memory, and semantic matching adds search through long-term 
memory. 
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One can cstimat** each of these stage durations either by subtracting latency on the predecessor 
task from latency on the target task (the difference score model), or by statistically holding constant the 
duration of all predecessor tasks (the part correlation model). The two models employ differing 
assumptions about the relationships among task components. The difference score model assumes 
nothing about the relationship between the duration of the target component (e.g., comparison) and 
the duration of the predecessor task (e.g., choice reaction time). Thus, this correlation is a parameter 
to be estimated. But the cost of this flexibility is the assumption that the duration of the target 
component (e.g., comparison) remains constant, regardless of whether ^he component is embedded in 
the physical matching task, the name matching task, or whatever. Conceptually there are two problems 
with this assumption. Consider the reaction component. It may be that reaction is rapid when nothing 
else is going on, as on the simple reaction time task, but slow when it follows complex processing, as on 
the semantic matching task. Or it could be the opposite, due to paiallel processing: Reaction appears 
slow on the simple reaction time task because it is the only process executing; but on the meaning 
identity task, the reaction begins before decision ends, and thus appears fast (as is specified in process 
cascading models, McClelland, 1979). 

The part correlation model avoids this assumption and allows for variability in stage durations over 
different tasks. This is represented as freedom in the regression weight asscKiated with stage duration 
to differ from 1.0. But in order to achieve this flexibility, the part correlation model must compensate 
with an assumption not required with the difference score model. In Ihc part correlation model, it is 
assumed that the duration of the target stage is uncorrelated with the duration of the predecessor task. 
For example, the duration of the comparison component in the context of the ph\sical matching task 
would be assumed to be uncorrelated with response time on the choice reaction lime task. 

Which of these sets of assumptions is correct, those associated with the part correlation model or 
those associated with the difference score model? It is not possible to tell, hut it is possible to employ 
both models and then to be confident of relationships only when the models agree. 
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We took this approach in attempting to estimate the relationship between proce^'iing stage 
durations and performance on a vocabulary test, and also on a paired-associates learning task. 
Vocabulary is an interesting test case because it is a goo J measure of general intelligence. The current 
view is that breadth of word knowledge reflects efficient learning processes in inferring word meanings 
in context (Marshalek, 1981; Sternberg & Powell, 1983). An additional motivation for looking at 
vocabulary as a criterion was that a considerable literature has evolved from Hunt and colleagues' 
(Hunt et al., 1973) early fmding of a relationship between the duration of the retrieval stage (as 
estimated by the difference between response time on the name and physical matching tasks) and 
verbal ability. 

Contrary to Hunt et al. and other previous work, however, we did not find much of a relationship 
between rctnevai speed and vocabulary (r ^ .17, N = 710), but we did find a strong relationship 
between search speed and vocabulary (r = .49). Subjects capable of quickly accessing semantic 
attributes of words, controlling for how quickly they d<d other kinds of information processing, had 
larger vocabularies than did other subjects. 

We found a similar relationship between processing speed and learning, but only in particular 
circumstances-namely, when study time on the learning t.^sk was extremely short (.5 to 2 seconds per 
pair). The component analysis again made it possible Ui i.solate the semantic search component, as 
opposed to other prixessing speed components, as the one consistently most critical in determining 
learning success. Q\er a number of studies (whieh varied on block si/c, recognition vs. recall 
responses, etc.), the correlation between learning success and response time on the meaning identity 
test, controlling for (or eliminating h> subtraction) response time on other information processing tests, 
ranged from r = .3() to r = .50. In some studies, other information processing speed components 
predicted learning outcomes, but only inconsistently. 

We currently arc engaged in two lines of extension to the prcKessing speed work. One is motivated 
by the idea that information processing speed may be closely tied to working memory capacity insofar 
as both measures reflect the dynamic activation level of a raemory trace (Wolt/, 1^)87). An intriguing 
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implication of this idea has to do with individual differences in the maintenance of activation. In most 
learning tasks, we do not simply access a term once and only once. Rather, there is redundancy in 
instructional materials, which allows for multiple accesses of a concept in an instructional episode. 
Thus, the important search speed variable is not merely how quickly a concept can be accessed on first 
encounter, but also how quickly the concept can be re-accessed on second, third, and fourth 
encounters. Woltz (1987) has shown not only that subsequent accesses are much faster than first 
encounters, but that there are substantial individual differences in the amount of improvement in speed 
from first to subsequent encounters. Interestingly, those who benefit most die not necessarily those 
who are quickest initially. We explore further ramifications of the idea of activation as a concept 
underlying working memory capacity in the next section. 

A second extension to the processing speed work involves the exploration of reaction time 
distributions as a way of determining how subjects process items. There is some Nvork (Hockley, 1984; 
Ratcliff & Murdock, 1976) suggesting that reaction time on simple tasks actually reflects two 
underlying components: a normally distributed processing component (e.g., true comparison time) and 
an exponentially distributed waiting time component (e.g., time of attention lapses and the like). We 
arc currently investigating the feasibility of estimating these reaction time components and determining 
whether they reflect reliably different processes (Fairbank, in preparation). 

In summary, we are continuing to explore a number of mathematical models for identifying 
component processing speed, and for determining the relationships among different kinds of 
processing. One benefit from this kind of analysis is that it enables the determination of whether 
processing speed is a single construct or whether there are multifile varieties jf prcKCssing speed (the 
latter appears to be the case). The implication for test development has to do with how, and how many 
different kinds of tests vill be necessary, to measure processing speed. 

A second benefit from this kind of analysis is that it allows one to determih ' hat kind of cognitive 
processing affects learning (in different contexts). One result is that it appears that general reaction 
speed is not as highly related and therefore fundamental to learning as might be expected on the basis 
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of work by Jensen and others. We have found relationships between basic reaction time and learning, 
but the particular component of speed of searching semantic memory appears to be the more critical 
predictor of verbal learning success. This is shown both in studies employing vocabulary scores as a 
criterion and in those employing a highly speeded presentation of material to be learned. (Perhaps 
both tasks reflect the learner's ability to quickly elaborate on the stimulus material.) 

Processing Capacity 

AUhough much of the early work on the project was concerned with response time, we recently 
have begun focusing more attention on similar kinds of analyses of working memory capacity. It now 
appears, not only on the basis of our own work (Kyllonen, Stephens, & Wolt/, 1988; Woltz & Christal, 
1985) but on work from a number of laboratories [ \nderson & Jeffries, 1985; Daneman & Carpenter, 
1980; Hitch, 1978), that this component of the information processing system is responsible for learner 
differences on a wide variety of learning tasks. 

In keeping with contemporary views of the human cognitive architecture, we propose that working 
memory may be defined as that portion of memory currently in a highly active or uccessible state; that 
is, whatever is being processed or attended to at any given time. The individual differences corollary is 
that grei^ter working memory capacity should be associated with greater attentional and learning 
capabilities. Woltz (1987) ha<; pointed out that this quite general description of working memory 
capacity is realized in the literature in two rather different forms, which we will refer to as the 
processing workspace and activation capacity models. 

The pnKCSsing workspace model of working memory, due largely to the work of Baddeley and Hitch 
(1974), proposes a limited, consciously controlled, short-term memory capable of storing roughly three 
to nine items simultaneously. The capacity of this structure is determined mostly by how efficiently one 
processes new incoming information. Much of our work on working memory to date has consisted of 
ths application of the processing workspace model to the development of working memory capacity 
tasks. The guiding construction principle is that the task requires the retention of some information, 
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while simultaneously reqa^ring the processing or transforming of other information. This principle is 
consistent with Baddeley and Hitch's (1974) original definition, and seems on the surface to lend itself 
readily to ecologically valid tests of memory capacity insofar as much of learning demands 
simultaneous retention and processing. In contrast, what is required on span tests seems contrived and 
not typical of what people actually do when engaged in realistic learning. 

Figure 2 shows sample items from various tests developed in our laboratory. In the "ABCD Test," 
the subject is informed that all items involve two sets of letters. The first set is defmed as the letters A 
and B, and the second set as the letters C and D. The subject is then presented three statement frames 
that constrain the ordering of the four letters. In the item pictured, for example, the subject is 
presented a frame which states that C follows D. The subject next is presented a frame which states 
that Set 1 precedes Set 2. The subject is expected at this point to note that the letters A and B will 
precede the letters C and D in the final list. On the third frame, the subject is informed that B follows 
A. The frames are presented successively, and the subject cannot look back to retrieve previous 
statements. From the three assertions, the subject would be ^^xpected to generate the proper ordering 
of the four letters, ABDC. The test probe is then presented, and the subject responds by selecting one 
of the eight orders presented as multiple-choice alternatives. 

A second test, the "ABC Test,"* also involves successive presentation of instruction frames; onl\ 
here, the instruction frames are assignments of either values (e.g., A = 3), expressions (e.g., A = 24 - 
17), or equations (e.g., A ~ B / C). In the item pictured, the subject first sees that A gets '^c value of 
B divided by 7. The subject does not yet know what B is anu so must remember the equation. The 
next frame states that C r,cis the value of B plus 4. Again, the subject still docs not know the value of B 
and so must remember the equation. Finally, the subject is shown that B is 13 minus 9, and this allows 
him or her to solve for C and A. But in order to do so the subject must remember the equations for C 
and A. The subject is then tested for which values he or she can remember. 

In the third test, the "Alpha Recoding Test," the subject is shown either one, two, or three random 
letters, one at a time on successive frames. On the next frame the subject is instructed either to add o\ 
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EXAMPLE ITEMS FROM TESTS MEASURiNG ATTENTION CAPACITY 



ABCD TEST 



c 
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SET 2 h 
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ALL ElOHT ORDERS 
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C«B«4 
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A*? 



ALPHA RECODING TEST 
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ALL THREE LETTERS, THE 
SUBJECT ENTERS TH£M AS A SET 



MENTAL ARITHMETIC TEST 
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SUBJECT IS GIVEN 2 SECONDS 

TO ENCODE PROBLEM, THEN 

THE SCREEN GOES BLANK 

HE PRESSES SPACE BAR WHEN 

HE HAS MENTALLV SOLVED THE 

PROBLEM. AND SELECTS ANSWER 

FROM 6 ALTlRNATIVES IN 3 SECONDS 



Figure 2, Sample Test Items Mca curing Working Memory Capacity, Test results were analyzed in 
Chi istal (1987). 
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to subtract 1, 2, or 3 (n). Add and subtract in this context means to determine which letter follows or 
precedes each of the target letters by n positions. After mentally recoding all the letters, the subject 
presses the space bar and enters the answer. The other test shov/n in Figure 2, the "Mental Arithmetic 
Test," is self-e3q>lanatory. 

As with information processing speed, an important initial question to be asked regarding 
performance on these kinds of tasks is whether working memory capacity is a unitary or 
multidimensional construct. A related question concerns the relationship between working memory 
and performance on other more conventional aptitude tests. We addressed both questions in a large- 
scale correlational study recently completed (Christal, 1987). We administered the tests shown m 
Figure 2y along with additional measures such as Memory Span, the AB Sentence-Picture Verification 
Test^ (Baddeiey, 1968), and the Sunday-Tuesday Tcst^ (Hunt et al., 1973). Additionally we had 
available subjects* scores on the Armed Services Vocational Aptitude Battery (ASVAB), which consists 
jf 10 paper-and-pencil subtests, such as Word Knowledge, Paragraph Comprehension, Numerical 
Operations (Number Facts), and General Science Information. 

A correlation matrix was generated from the percent correct and the latency scores on the 

computerized tests and the raw scores on the timed ASVAB subtests. A principal- axis factor analysis 

of this matrix yielded four factors. A Working Memory factor was defmed primarily by percent correct 

scores from the ABC Test (r - .80), but also was heavily loaded by the ABCD Test, Mental Arithmetic 

Test, and the other working memory measures (all of which showed r > .60). The two verbal 

measures. Word Knowledge and Para^^'iph Comprehension, had only modest loadings on this factor ( 

< .15). In addition to the Working Memory factor, separate Verbal and Speeded-Quantitative factors 

were extracted. The Vcibal factor was defined by ASVAB Word Knowledge (r = .77), but also was 

highly loaded by both the ABCD Test and the AB Sentence-Picture Verification Test, which may be 

thought of as an abridged version of the ABCD Test (r > .SO). The Speeded-Quantitative factor was 

^This test requires subjects to judge whether a sentence such as "A is not preceded by B" matches a 
string such as "BA." 

*rhis test requires subjects to perform base 7 addition on days-of-the-week values, with Sunday 
assigned 1, e.g., "Sunday + Tuesday = Wednesday." 

15 



defined by the Numerical Operations subtest (r « .75), but it also was Mgnificantly loaded by latecdes 
from the Mental Arithmetic Test and the Sunday-Tuesday Test (r > .30). The basic pattern of results 
found here hrs been corroborated in a recently completed foUow-up study. 

Taken together, the results suggest the mvoivement of both domain knowledge (quantitative and 
verbal) and a domain-indepef-dent working memory in memory test performance. In addition, it 
appears from the da'a over the two studies that the Working Memory factor subsumes the Reasoning 
factor. That is, mdividual differences m reasoning profidency may be due entirely to differences in 
working memory capadty. Christal notes that the factor on which all the reasoning tests in the battery 
loaded highly is a Working Memory factor m that the test that defined it. Alpha Recoding (r - .58, in 
the follow-up study), does not appear to mvolve reasoning per se but dearly depends on working 
memory capadty. 

Recently, we have begun investigating an alternative to the processing workspace model which is 
based on a different conceptualization of working memory. The activation capadty model, based 
prunarily on Anderson's (1983) ACT* theory, defines working memory, not as a separate short-term 
store but rather, as a state of fluctuating activation patterns characterizing traces in long-term memory. 
According to this theory, long-term memory is a network of traces, each characterized by resting 
activation Lvels. Traces become activated when they bca>me the focus of attention or are linked to 
the focus of attention, then fade into a state of deactivation as other traces move to the center of focus. 
Working memory is said to be a "matter of degree" rather than an all-or-none state, in that at any given 
moment, a trace might be the focus of attention (and thereby be at a peak activation level) or it might 
be continuously fading from attention if, for example, it was the focus a few seconds earlier. 

The apphcation of this model has resulted in tests of working memory capadty that look quite 
distmct from those based on the processing workspace model. Figure 3 illustrates a test developed by 
Wolf: (1987) lo reflect individual differences in activation capacity. In this test, subjects are presented 
a »*c;i'ies of word pairs and are requested to determine whether or not the words are synonyms. 
Occasionally, words are repeated one, two, four, or eight items later. As Figure 3 shows, mean 
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Figure 3. V^oltis (1987) Procedure and Resulting Statistics for Measuring Memory Activation 
Capacity. 
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response time is 126S ms if neither of the words was shown before, but that time is reduced by 191 ms 
if one of the words wis encountered on the previous item, and by 107 ms if one of the words was 
encountered eight items ago. The interpretation is that the word encountered even eight items ago is 
still mote highly active than it would be at its true resting state, and therefore is processed faster. 
Woltz argues that individual differences in the response time facilitation effect reflect differences in 
activation capacity. 

Given tLat we can define working memory capadty in two distinct ways, an important next question 
is: What Ls the empirical relationship between the two kinds of measures, and even more importantly, 
what is their relationship to learmng? Cognitive analyses of learning tasks (Anderson, 1987; Anderson 
&, Jeffries, 1985), such as mathematics learning or learning a computer programming language, suggest 
that the limiting factor m learning is the working memory bottleneck. But the proof of this assertion is 
often rather theoretical, based on a rational analysis of learning task requirements, supplemented by a 
formal computer simulation of learning processes. An individual differences analysis of the role of 
working memory in learning can be a useful supplement to this kind of formal analysis, and is a fair test 
of the theoretical claim (Underwood, 1975). Thus, wc have recently begun investigating the 
relationship between working mevnory capacity (as measured by tests such as those displayed in 
Figures 2 and 3) and performance in realistic learning conltA's. We currently are investigating the 
acquisition of electronics troubleshooting (Kyllonen, Stephens, & Woltz, 1988) and computer 
programming skills (Kyllonen, Soule, & Stephens, 1988) and other procedural learning tasks (Woltz, 
1987). In all ca5?s, we find that working memory, as indicated by both the processing workspace and 
activation capacity measures, is a strong predictor of learning outcomes. These analyses are beginning 
to clarify our imderstanding of working memory. These studies also suggest that the particular tests of 
working memory capacity that we have already developed (Figures 2 and 3) are solid candidates for 
inclusion in ftiture testiug batteries. 
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Knowledge 

In our fouT'SOuice framework for cognitive skill assessment, we refer to declarative knowledge and 
procedural skills as enabiers. It has been argued that the main contribution from cognitive psychology 
to the ^'^ generation of psycholo^cal tests is in how we now can assess the m^^'arars:-information 
process )eed and working memory capacity-rather than the enabiers. The idea behind this 
thinking is ^hat existing tests already do an adequate job at sampling the breadth of an individual's 
knovriedge. For example, existing vocabulary tests probably are fair samples of what a person knows 
(although faceted vocabulary tests with a consistent sampling scheme are probably even better, 
Anderson & Freebody, 1979; Cronbach, 1942; Marshaick, 1981). Also, the ASVAB includes a number 
of subtests-Auto and Shop Knowledge, Mechanical Comprehension, Electrical Knowledge-that are 
clearly designed to sample the breadth of technical knowledge a student brings to the test. 

Thus, in much of our research, the measurement of knowledge has played a rather small role, 
especially when considered against the backdrop of its critical role in current cognitive theories 
generally. In cxpeiiments conducted to date, we have assessed knowledge primarily as a means for 
statistically controlling its effects; our main goal has bc«;n to investigate the mediator variables, whicl Is 
best done by holding the knowledge effect constant. 

Perhaps the reason we have failed to progress in assessing the role of knowledge in learning is that 
our learning tasks have purposely been rather domain-independent. It may be that advances in 
understanding the role of knowledge will be forthcoming only once we begin our actual complex 
learning experiments (described in the next section). Still, there has been a considerable body of 
cognitive research conducted over the last 10 years that enables speculations. 

We propose that an individual's declarative knowledge base may be characterized along four 
general dimensions: depth, breadth, accessibUity (durability), and organization. Depth refers to the 
amount of domain-specify conceptual krowledge possessed W the individual. Conventional 
achievement tests, and especially job surveys as they are employed in assessing trainee or apprentice 
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status, are designed to tap this dimension of dedarativc knowledge. Breadth refer. lO the amount of 
general factual knowledge available. Current intelligence tests, such as the Wechsler Adult Intelligence 
Scale (WAIS), include an Information subtest designed to probe breadth of knowledge. Vocabulary 
tests can also be seen as measures of breadth of knowledge. Accessibility refers to the strength of the 
knowledge; that is, the likelihood (and the speed with which) it will be accessed in a situation in which 
it could be used. Accessibility is both a general characterization of all knowledge an individual 
possesses and a specific parameter of everv fact in the knowledge base. Accessibility is also a dynamic 
property of specific kno^edge, in that it weakens with disuse and grows stronger with practice. 
Organization refers to the relations and connections among the facts in the knowledge base. A 
considerable body of research in cognitive science has grown around the idea that acquiring expertis'^^ 
m a domain mvolves the reorganization of facts in the domain (e.g., Lesgold, 1984). 

Vaiious methods have been developed to lap these knowledge dimensions. Clustering and scaling 
methods have been used to map the organization of knowledge in numerous domains such as physics 
(Chi, Glaser, & Recs, 1982), biology (Stephens, 1987), computer science (Adelson, 1981), psychology 
(Fabricious, Schwanenflugel, Kyllonen, Barclay, & Denton, 1987), and so on. Typically, a student is 
asked to judge the similarity of two concepts selected from the domain. Clust'jring and scaling 
methods are used to capture the underlying mode^ used by the student to generate the similarity 
judgments. 

Th'^.re are many ways to lap accessibility of knowledge. We have used the sentence verification 
leckique extensivel> (e.g., 1 irre, Royer, Greene, & Sinatra, 1987). Learning in the typical training 
situation mvolves listening to a lecture or read'mg a text, then solving problems based on the material 
heard or read. The senlenjpc verification technique is designed to probe the amount of material the 
learner was able to successfully encode and store in long-term memory following the Ustening or 
reading episode. The technique requires learners to disaimmatc between accurate paraphrases of 
sentences previously read and paraphrases that are inconsistent with what was read. Other techniques 
such as the cloze procedure (fill-in-the-blanks of sentences extracted from the preceding text) have 
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been used for a similar purpose (Landauer, 1986). We are currently using the sentence veriGcation 
technique for tracking the accumulation of declarative knowledge during the course of short (45 
minut^) instructional episodes in computer programming (Kyllcnen, Soule, & Stephens, 1988) and 
•electronics troubleshooting (Kronen, Stephens, & Woltz, 1988), 

Even the measurement of the depth and breadlh dimensions of knowledge may benefit from recent 
work in cognitive science. The most innovative recent developments m probing declarative knowledge 
have been pursued by rescarcherf concerned with achievement testing (Frederiksen, Lesgold, Glaser, 
& Shafto, in press; Glaser, Lcsgold, & Lajoie, in press; Hacrtel, 1985; Lcsgold et al., 1987). Glaser el 
al. point out that current methods, typically 5-altcrnative multiple-choice tests, suffer two key 
drawbacks. First, the alternatives cannot possibly accommodate all the possible misconceptions a 
student could possess, and thus are of limited diagnostic utility. Second, the alternatives may give away 
the answer, as has been shown in other realms. 

Glaser et al. discuss the potential of cognitive approaches to knowledge assessment, which iii 
contrast rely primarily on a very detailed analysis of verbal protocols extracted from students struggling 
with new material or applying what they have already learned. Analysis of these kinds of prc:<K:ols has 
played a critical role in the development of a cognitive science (Ericsson & Simon, 1984) and >erves as 
♦he primary basis for what Glaser, Lcsgold, Lajoie, et al. (1985) have dubbed cognitive task analysis. 
The problem with wholesale adoption of the technique at this time is expense Protocol analyses are 
costly in both subject and interviewer time, and are therefore not appropriate for inclusion n a test 
battery. 

But Glaser et al. suggest an ingenious compromise between conventional and protocol methods. In 
their hidarthical menus methodology, students select alternatives from a series of linked nenos. For 
example, if there are five alternatives to each menu and there are three levels of linked menus, there 
can be 5^ = 125 response alternatives. This is superior to simply presenting 125 alterna ives cn screen, 
for two reasons. First, selecting from among 125 a!ternati\es would impose a sev ere pr3cessing load on 
subjects, and would inv! z nuisance individual-difference variation in strategy selection and test-taking 
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strategy. Second, the hierarchical arraogemeDt can closely mirror the way in which a student is 
thinking about a problem, in a kind of top-down fashion. 

Thus fai, this approach to probing an individuars knowledge has been employed in one of the 
CLASS tutoring systems. Bridge (Bonar & Cunningham, 1986), which teaches learners how to 
program in Pascal, presents general programming problems to be solved. At the top level (the first set 
of questions), the alternatives are general categories or general approaches to the problem (e. g., "add 
something together" or "keep doing something"). Once the student selects a category, he or she is 
presented a list of alternatives that refme the category selection, and so on, until a fully specified 
answer is selected From pilot testing usin'* Air Force subjects, the method has proved general enough 
to accommodate the vast majority of potential responses to particular programming problems; 
therefore, the approach seems highly promising as a way of assessing knowledge status in the student. 

To summarize, although we have not yet fully explored the domain of how to probe a learner's 
declarative knowledge base, we have made some important initial steps, it is likely that as we begin 
further testing in the more complex tutoring systems environments, the methods described in this 
section will be refined further. 

Skills 

We defme skills or procedural knowledge as it is referred to in the cognitive science literature, fairly 

informally, as any unit of knowledge thai is typically or would likely be represented in >:oduction 

system simulations in the form of an if-then rule or series of if-then rules. This is any knowledge or 

skill the student has that might bear directly on problem solving ("how-to knowledge"). Procedural skill 

varies widely along the generality dimension; at the most general level are problem-solving heuristics or 

approaches, such as working backward, means-ends analysis, or persisting in the face of uncertainty. 

At the opposite end of the continuum are very specific procedures, such as moving the cursor to 

position 12, 45 when required to delete a character at position 12, 45. 

# 
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One fairly consistent finding in cognitive research is that although specific procedures are trainable, 
general procedures are quite resistant to modification. This finding is certainly not due to a shortage of 
attempts to modify general skills. Kulik, Bangert-Downs, and Kulik (1964) reviewed over 50 studies of 
the effects of extensive coaching for the Scholastic Aptit^ide Test (SAT). They concluded that the 
effects^ even for long-term trainings were quite small (approximately one-sixth to one-third standard 
deviation, or 17 to 34 points). The results of Venezuela's Project Intelligence (Hermstein, Nickerson, 
de Sanchez, & Swcts, 1966) may be seen similarly as somewhat disappointing. Despite an ambitious 
project in which domaui-Gree thinking skills were taught 4 days per week, in 4S-mmute lessons, for an 
entire year, the actual changes experienced on standard measures of cognitive skill (mtelligence tests) 
were quit ^ minuscule (about J sd). These findings should not have come as any great surprise. 
Attempts to have students transfer general problem-sohring approaches to super^dally distinct but 
isomorphically identical problems have repeatedly failed (e.g.. Brown & Camptone, 1978; Simon & 
Hayes, 1976). 

On the other hand, there is good evidence for the modifiability of specific skills, especially in 
context. Schoenfeld (1979) has shown how training in mathematical heuristics (e.g., draw a diagram, 
simplify the problem, test the limiting case) can facilitate subsequent problem solving so long as the 
instruction is wedded tightly to the domam material simultaneously being taught. Recent analyses of 
transfer of training have shown that skill transfer is excellent and quite predictable when the skills 
transferred are related at some conceptual level to the new skills (Anderson, 1987; Kieras & Bovair, 
1986). 

The implications of these two results tor testing purposes are apparent. On the one hand, specific 
procedural knowledge is rather easily modifiable and therefore ought to perhaps be trained rather than 
tested for, at least in thf personnel selection and classification context. Recent woik on diagnostic 
monitoring (Frederiksen et al., in press; Lesgold et al., 1987) shows how tests can be used to tailor 
instruction and are thus appropriate for this purpose. On the Other hand, general procedural 
knowledge should have an important predictive relationship to learning ability, and it seems to be fairly 
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immutable. General procedural knowledge, therefore, is an ideal capability to test for in entrance 
(selection and dassificatioo) testing. It is interesting that researdiers from very diverse perqiectivcs- 
psychometric (Cattell, 1971), informatiott processing (Sternberg, 1981a), and artifikial intelligence 
(Schank, 1980)-have argued consistently for the importance of the ability to cope with novel problems 
as a key aspect of intelligence, and therefore as an ideal candidate for inchision in qititude test 
batteries. 

Do we now test for general procedural knowledge, or general problem*solving skills? As was the 
case with declarative knovrfedgc, there certainly are m existence paper-and*pencil tests that would 
appear to tap very general problem-soKing skill-Raven's Progressive Matrices being an excellent 
example. And about 7 years ago, ETS began supplementing its existing Verbal and Quantitative 
portions of the Graduate Record Examination with a new test of Analytic ability (Wilson, 1976). The 
AS VAB comes close to testing general problem-solviog ability with the Arithmetic Reasoning subtest. 
This subtest consists of story problems such as "How many 36-passenger buses wiU it take to carry 144 
pcofde?" (I>oD, 198 i). Recall that the Arithmetic Reasoning subtest loaded highly on the Working 
Memory factor m the Christal (1967) study, ^ch suggests an intriguing research question: What is 
the relationship between working memory and procedural skill? 

We can think of working memory capacity as mediating the development and efficiency of general 
problem-solving strategies. But an alternative view of the relationship between the two construct' 
assigns the central role to working memory. Baddeley (19S7) has proposed a model of working 
memwy consisting of various slave storage subsystems (for storing linguistic information, spatial 
information, etc), along with a central executive which monitors and coordinates the activities of the 
subsidiary storage systems. Executive skill, then, is skill la monitoring one*s problem-solvfaig processes, 
adapting to rhflnging task requirements, successfully executing general problem-solving strategies, 
allocating resources vdiere they are needed, and more generally, changing processing strategy in 
accordance with changes in processing demands. 
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In this way» the executive can be seen as the most important component of working memory. Yet, 
though wc have a reasonable understanding of how the subsidiary storage systems function, according 
to Baddeley the wo kings of the central executive still remain largely a mystery. An important and 
exciting research direction is to begin devising means for measuring executive skill and thereby begin 
unraveling that mystery. 

Modeling Learning Ski!ls 

Learning Skills Taxonomy 

If we can adequately measure knowledge and the various skills associated with the four sources, an 
important next step in the research program is to demonstrate the relationship between those scores 
and scores generated from a trainee's mteraction with a learning task. We believe that learning should 
be expressible in terms of (i. e., predictable from) the underlying components, but it is necessary to 
prove that this is the case. 

Much of our research until fairly recently has used grossly simplified learning tasks as criterion 
measures against which to validate the new cognitive abilities measures. For example, in the Kyllonen* 
Tirre-Christal (1988) study, performance on various paired*associates tests were used as criteria; and in 
other studies, wc have employed comparably simple, short-term learning tasks. The logic underlying 
this decision is twofold. First, we are concerned with developing rigorous models of the aptitude- 
leaming*outcome relationship; and simple^ short-term learning tasks afford more control over the 
instructional environment. But second, we believe that the kind of learning involved in even these 
simple tasks is at some fundamental level the same as that involved in more realistic learning situations. 
Or, conversely^ even apparently complex classroom learning can be analyzed and decomposed into a 
series of much simpler learning acts. 

If we accept the notion that even complex learning tasks can be broken down mto their constituent 
learning activities, then it obviously would be useful to specify the nature of those basic learning 
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activities. One proposal that has been useful in our work, based largely on Anderson*s (1987) three- 
stage model of skill acquisition, is represented on the right side of Hgure 1. The idea is that cognitive 
skills develop through an initial engagement of declarative learning processes ("memorizing the steps"), 
followed by an engagement of proceduralization processes ("executing the steps"), then finally 
refinement processes ("automatizing the steps"). As Figure 4 shows, different performance measures 
will be sensitive to thr course of skill development at various points along the way. Whe* . first learning 
a skill, many mistakes will be made, and accuracy measures will be the most sensitive indicators of skill 
development. Later, when the skill is known, few mistakes will be made, and performance time 
measures will be the most sensitive mdicators. Still later, performance time will approach a minimum 
as the target skill becomes increasingly automatized, but there might still be cor^iderable variability in 
whether (and how much) other processing can be occurring while the target skill is being executed. 

We (KvUonen & Shute, in press) receatly 'elaborated on this simple taxonomy in proposing that in 
addition to the status of the skill (i.e., whether the skill is in a declarative, procedural, or automatic 
state, which we identified as the knowledge-type dimension), learning could be classified along three 
other dimensions: the learning environment, the domain, and the learner's cognitive style. 

The learning environment specifies the nature of the inference process required by the student: The 
simplest learning act involves rote memorization. Learning by actively encoding, by deduction, by 
analogically reasoning, by refinement through reflection following practice, by induction from 
examples, and by observation and discovery involves successively more complex processing on the part 
of the learner. The second dimension, the resulting knowledge-type, as indicated above, specifies 
whether the product of the learning act is a new chunk of declarative knowledge (a new fact or body of 
facts) or new procedural knowledge (a ru)^, a skill, or a mental model). The third dimension, the 
domain^ refers to whether learning is occurring in a tedmical, niianatative domain or a more verbal, 
non*technical domain. Together, these three dimensions specify a particular kind of learning act. The 
fourth dimension, the learner's cognitive style, is a property of the learner rather than of the 
instructional situation per se. But we included it in recognition of the possibility that we cannot be 
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Being Measured. The different dependent measures are optin ally sensitive to individual 
differences at different stages. 



ERIC 



27 



certain on any task of vfhaX learning skill is being assessed unless we consider how the learner is 
approaching the task. 

Our proposal, ^ch has not in any sense been put to the test, is that the taxonomy should prove 
useful in two ways. First, it provides a siimpling space from which we may draw learning tasks. The 
goal of the LAMP efTcrt is to model learning ability using cognitive skill measures; the taxonomy 
specifies the range or learning tasks for which we must develop adequate moc^els. Second, in reverse 
fashion, the taxonomy specifies the kinds of micro-level learning acts that combine to make complex 
learning. This aspect provides a task analysis tool. Our idea is that we can inspect the requirements of 
any complex learning situation, in the dassroon or in front of a computer, and specify what learning 
acts are occurring. Given any instructional exchange, we can find a cell in the taxonomy that represents 
that exchange. 

Complex Learning Assessment (CLASS) 

One potential stumbling block for any program like ours is that it is not easy to monitor progress. 
To determine whether our innovative measurement methods are valid predictors of learning success, it 
is necessary to observe students engaged in learning. Two approaches have traditionally been taken. 
One is to validate the new tests against some criterion reflecting success in operational training, such as 
final course grade point average. The benefit of this approadi is that inferences from the research are 
direct, but there are a number of drawbacks: Data collection is extremely slow, instructor quality is 
highly variable and may mteract with learner characteristics in affectmg learning outcomes, and there is 
no allowance for manipulating the learning task in any way so as to allow "what-iT questions regarding 
validity (e.g., "what if the instructor encouraged more questions, would that differentially affect student 
outcomes?*). 

The second approach is to simplify the learning task such that it is under the experimenter's control 
and can be administered within a single session. With complete control over the learning task, one can 
ask and lest what-if questions easily. Unfortunately, m so modifying the learning task, the researcher 
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cannot necessarily continue to assume that the instruments shown to be valid in the experimental 
context will prove to be valid in predicting success in more realistic learning situations. 

Our solution to the validity problem represents a compromise between these two positions. We are 
currently designing intelligent computerized tutoring systems to teach computer programming, 
electronics troubleshooting, and fligbt engineering in S6-hour mini-courses (Learning Research & 
Development Center, 1987). In addition, we will add new mini-courses over the next several years. 
The tutcnng systems are being designed to produce a rich variety of indices of the learner's curriculum 
knowledge and his or her progress Li acquiring the new knowledge and skills being taught. The 
tutoring systems are sufiBciently flexible so that it is easy to modify the instructional strategy and thus 
ask what-if questions. The learning involved, however, is not trivial. It has been estimated that 1 hour 
of tutored instruction is equivalent to approximately 4 hours of regular classroom instruction 
(Anderson, Boyle, & Reiser, 1984); thus, these mini-courses are quite extensive. A major goal of our 
current research efforts is to use the taxonomy to generate the most expressive indices of the student's 
learning experience. 

We envision a broad range of research questions that can be addressed once we begin gathering 
data with these kinds of learning indices. First, the indices can serve as alternatives to end-of-course 
achievement test scores as criteria for validating new cognitive aptitude tests. An Index such as 
"probability of remember ug an instructional proposition (as a function of the amount of study and 
presentation lag)" is more precise and potentially more general than a broad achievement test score. 
Such a fine breakdown of the learning experience also permits enhanced analyses among the indices 
themselves. For example, we can begin investigating mo^'^ predsct^ questions concerning the 
relationship betwc^' initial knowledge acquisition and the subsequent ability to turn that knowledge 
into problem-solving skill, or the ability to tune that skill with more problem-solving experience. 

Finally, developing rich profiles of an individual learner's strengths and weaknesses in the form of 
elaborate assemblies of learning indices should permit a reassessment of the aptitude-treatment- 
interaction (ATI) idea (Cronbach & Snow, 1977). Probably, the inconclusiveness of past ATI research 
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can be traced to the employment of global aptitude bdices and global kaming outcome measures 
along with pragmatic limitations on instructional variatioc. The tutoring systems being developed 
overcome these limitations by generating richer traces of a learner's path through a curriculum, and by 
being sufficiently flexible to allow potentially unlimited variations in how instruction is presented. 

IV. SUMMARY AND CONCLUSIONS 

This paper has outlmed some of the research activities underway as part of the Air Force's Learning 
Abilities Measurement Program (LAMP). The major goal of the project is to devise new models of 
the nature and organization of human abilities, with the long-term goal of applying those models to 
improve current personnel selection and classification systems. 

As an approach to this ambitious undertaking, wc have divided the activities of the project into two 
categories. The first category is concerned with identifying fundamental learaing abilities by 
determining how learners differ m their abilities to think, remember, solve problems, and acquire 
knowledge and skills. From research ahready completed, we have established a four*source framework 
that assumes that observeu learner differences are due to differences in information processmg 
efficiency; working memory capacity; and the breadth, extent, and accessibility of conceptual knowledge 
and procedural and strategic skills. 

The second category of research activities is concerned with validating new models of learning 
abilities. To do this, we are building a number of computerized intelligent tutoring systems that serve 
as mini-courses in technical areas such as computer programming and electronics troubleshooting. A 
major objective of this part of the program is to develop principles for producing mdicators of student 
learning progress and achievement. These indicators will serve as the learning outcome measures 
against which newly de^loped learning abilities tests will be evaluated m future validation studies. The 
mdicators also will be applied m studies that investigate the dynamics of knowledge and skill acquisition 
and in studies that attempt to optimize instruction so as to capitalize on and compensate for learner 
strengths and weaknesses. 
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